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1. INTRODUCTION 

Aggregate nuclear plant failure data is used to produce generic common cause factors that are 
specifically for use in the common-cause failure models of NUREG/CR-5485. Furthermore, the 
models presented in NUREG/CR-5485 are specifically designed to incorporate two significantly 
distinct assumptions about the methods of surveillance testing from whence this aggregate failure data 
came. What are the implications of using these NUREG generic factors to model the common-cause 
failures of aerospace systems? Herein, the implications of using the NUREG generic factors in the 
modeling of aerospace systems are investigated in detail and strong recommendations for modeling 
the common-cause failures of aerospace systems are given. 

2. HISTORICAL BACKGROUND 

2.1. Early developments 

Since the series of letters between Pascal and Fennat over a gambling dispute started the mathematical 
formulation of probability theory in 1654 [1], quantitative risk analysis has been continuously 
evolving over various applications. The development of mass production and the issue of variability 
in these applications spawned the introduction of statistical quality control in the late 1920s [2], After 
WWII, the United States became more dependent upon mass produced electronics. In the early 1950s, 
the U.S. Department of Defense initiated a study on how to increase the reliability of a ubiquitous 
piece of electronics prone to failure - the vacuum tube. From this study, came the birth of Reliability 
Engineering and quantitative reliability analysis [3]. 

2.2. Advent of PRA 

The beginnings of Probabilistic Risk Assessment (PRA) centered around the aerospace industry in the 
1960s with the development of Fault Tree Analysis (FTA) in 1961 by Bell Laboratories for the 
Minuteman Launch Control System [4]. FTA was used by the Boeing Company to study the 
Minuteman Missile System, and further used by Boeing in the design of commercial aircraft. After 
the Apollo 1 launch-pad fire in 1967, NASA contracted Boeing to perfonn a risk assessment, and a 
Fault Tree Analysis was performed for the entire Apollo system [5]. 



In the 1970s, with the number of nuclear power plants growing, the safety of reactors became an 
important policy issue. The nuclear industry borrowed the techniques used in the aerospace industry 
to perfonn a PRA, and significantly contributed to further developing these techniques during this 
time. With the publication of WASH-1400 in 1975, event trees become a part of the PRA as well as 
the concept of the “common mode” failure. However, the quantification that is provided in WASH- 
1400 for common mode failures was known to need improvement. As Joel Yellin of the Rand 
Corporation put it, “the validity of the particular procedure used to set numerical values of common 
mode failure rates and uncertainties is extremely weak.” [6] 

2.3. Common Cause Failure Analysis and the Return to NASA 

From the release of WASH-1400 forward, many developments were made in the field of quantifying 
Common Cause Failures (CCF), some of which were the beta factor model (Fleming, 1975), Marshal- 
Olkin specializations (Vesely, 1977), and the MGL and alpha-factor methods (Mosleh, et. ah, 1985), 
which culminated in the release of NUREG/CR-4780 in 1988 [7], a joint Nuclear Regulatory 
Commission (NRC) and Electric Power Research Institute (EPRI) prepared document. Most recently, 
the release of NUREG/CR-5485 in 1998 [8] updates NUREG/CR-4780. 

Another significant event in the history of PRA was the return of the use of PRA techniques, improved 
by the nuclear industry, back to NASA following the Space Shuttle Challenger accident in 1986. PRA 
continued to be used on the Space Shuttle, the International Space Station, Orbital Space Plane [9], 
and the Constellation Program. The Constellation Program has established the Constellation Program 
Probabilistic Risk Assessment Methodology Document (CxP 70017) which requires the use of alpha 
factors to be used in CCF modeling: “the Alpha-Factor parameters should be calculated based on 
relevant failure history using the method documented in NUREG/CR-5485.” 

The use of the alpha factor method presents several challenges when applied to aerospace applications 
which will be addressed in this paper. In order to properly demonstrate the issue, an overview of 
parametric CCF models will now be given which focuses on the relationship of those models to the 
alpha factors and the impact that testing schemes have on the results. 

3. COMMON CAUSE FAILURE ANALYSIS METHODOLOGY REVIEW 

3.1. Parametric Models 

Among the approaches for analyzing and quantifying the effects of dependent failures in a system- 
failure analysis, inter-component dependencies at the basic-event level that are not explicitly modeled 
in the fault tree are modeled using parametric methods. These parametric models avoid the need to 
explicitly identify and enumerate causes [10] and are classified as shock vs. non-shock and direct vs. 
indirect. We will present a brief overview of representative methods within each category, but 
develop the basic parameter model in greater detail. The reader should refer to source material for a 
complete exposition of the theoretical development and quantification of the various methods. 

The Binomial Failure Rate (BFR) model is a shock model specialized (Veseley, 1977) from a more 
general model developed by Marshall and Olkin ( 1 967). It assumes individual similar components in 
a system configuration fail randomly at a constant rate (/.) with exponential mean time to failure, and 
common-cause failures result from common system shocks that occur randomly at an assumed 
constant rate (p) according to a Poisson arrival process. Associated with the occurrence of a shock is 
the conditional probability of failure (p). The product of the shock rate times the conditional 
probability of failure given a shock produces the system failure frequency. The number of component 
failures realized given a failed component is modeled as a binomial distribution (hence the name). 
The shock rate is not directly available from the data because the shocks that do not result in a failure 
are not observable. It can be seen that three parameters (p, X, p) must be estimated regardless of the 
number of similar components in the system configuration [10]. 



Non-shock models do not model the process that generates the occurrence and number of common- 
cause failures. These are classified as direct and indirect. The basic parameter model is termed direct 
because it develops an expression for the total basic-event failure frequencies (Q t ) of each of the 
components in a common-cause group of m similar components directly from the minimum cutset 
representation of failure. Other models, such as the beta-factor model, the Multiple Greek Letter 
(MGL) model, and the alpha-factor model are tenned indirect because they are re-parameterizations of 
the basic parameter model. The utility of the indirect methods is due to practical reasons arising from 
the demands imposed on the experience database due to the number of model parameters to be 
estimated and the parameter uncertainties resulting from estimates based on sparse or non-existent 
data. All of these methods require common-cause failure data, but among the indirect methods, the 
beta-factor model is the least demanding because it requires the estimation of only one common-cause 
parameter (P) in addition to the independent component failure rate (/.[) to model the total component 
failure rate (L T ). 

Following the presentations in NUREG/CR-4780 and NUREG/CR-5485, the basic parameter model is 
best explained with an example using a two-out-of-three parallel configuration of similar components 
(A, B, and C). Without considering common-cause, the simple 2/3 parallel system configuration is 
shown below in Figure 1 as a fault tree diagram. 



Solving the above tree for the minimum cutsets we obtain the following Boolean expression (with 
notation defined below Eq. 2) for system failure: 

S = A,*B I + A I *C, + B I *C, (1) 

To incorporate common-cause into the tree, the basic events need to be converted to gates and 
expanded as sub trees to represent the ways in which each of the three components can fail with 
common-cause involvement. This is illustrated in the Figure 2 below. 



Figure 2 2/3 with CCF 







The Boolean representation of the total failure frequency of component A is 


A x — Ai + Cab + C A c + C A bc 


( 2 ) 


Where 

A t = Total failure of components A. 

Aj = Failure of component A from independent causes. 

Cab = Failure of components A and B (and not component C) from common causes. 

Cac = Failure of components A and C (and not component B) from common causes. 

Cabc = Failure of components A, B and C from common causes. 

The reduced Boolean representation of the system failure in tenns of these cutsets is 

S = A^B, + Aj*C t + Bj*Ci + Cab + C AC + C BC + C ABC (3) 

For a one out of three success criteria, the Boolean expansion would have produced events such as 
C\is*C A c, which have questionable validity. One solution is to define these events as mutually 
exclusive, which implies P(C A b*C A c) = 0. Defined in this way, the Boolean representation of the total 
failure (A T ) of component A is a partition of the failure space of A into mutually exclusive event sets 
according to the explicit impact on the other components in the common-cause group. 

Using the rare-event approximation, P(AUB) ~ P(A)+P(B) and calculating probabilities yields 

P(S) = P(A I *B I ) + P(Ai*Q) + P(B,*Q) + P(Cab) + P(C AC ) + P(C BC ) + P(Cabc) ~ (4) 

P(A I )*P(B I ) + P(A I )*P(C I ) + P(B I )*P(Ci) + P(C AB ) + P(Cac) + P(C B c) + P(C AB c) 

Since the components A, B, and C are identical, an exchangeability (symmetry) argument is made for 
the components A, B, and C, which allows for simplification and reduction of the above expression. 
Accordingly, we introduce the following variables: 

P(A I ) = P(B I ) = P(Ci) = Q, 

P(C A b) = P(Cac) = P(Cbc) = Q2 (5) 

P(Cabc) = Qi 

Then substitute the variables in Eq. 5 into the expression in Eq. 4 to get the basic parameter model for 
the system failure frequency 

P(S) ~ 3Qi 2 + 3Q 2 + Q 3 (6) 

The expression for the total failure frequency of a component (Eq. 2) is easily generalized. For a 
configuration of m similar components in a redundant configuration, define the sequence of k 
probabilities 

(T ~ Probability of a common-cause basic event involving k specific components in a common- 
cause component group size m, (1 < k < m) 

For a component-group size of m and for a given component (A), there are (^Ti) distinct subsets of k 
elements that include component A. Clearly, the total component-level failure frequency for each of 
the components in the configuration is given by 

Qt= y n (T- - ! )q™ (?) 

‘—‘k= 1 

In the example given above for each of the components A, B, and C, the total failure frequency is 



( 8 ) 


P(A t ) = P(B t ) = P(C T ) = Q t = Qf } + 2Q ( 2 3) + Q® 

It’s important to note that numerical estimates of Q| ( nl ' ) derived from the generic common-cause 
database of component failure data is aggregated from multiple nuclear power-plant databases. There 
are uncertainties due to judgments inherent in the data-analysis process. Assigning failures to 
common-cause events also involves judgments. This is due in part because common-cause events 
occur in rough proximity over time and the classification is based on a heuristic method. Other 
sources of uncertainty are due to differences in nuclear power-plant design, operating procedures, 
maintenance procedures, hardware, and testing schemes. Consequently, the estimates of derived 
from the failure database are uncertain. Furthermore, the analysis process produces estimates of the 
numbers of common-cause failure events (n 1 ,n 2 , ...,n k ). However, the number of demands is not 
recorded in the database; therefore, it must be estimated from power-plant records. Assumptions must 
be made about the testing scheme; since it directly impacts the number of challenges on a component 
group per surveillance test interval. The testing scheme affects the estimate of the number of demands 
because most of the demands on a redundant standby system are due to surveillance testing. There are 
two surveillance testing schemes to consider — non-staggered and staggered. These are briefly 
explained below. 

3.2 Staggered versus Non-staggered Testing 

The power plant’s technical specification requires that all m components in a component group be 
tested at least once in the surveillance interval, for example monthly. The actual interval may vary 
from one component type to another. Within a surveillance interval there may be several test episodes 
depending on the testing scheme used. In non-staggered testing, all m components are tested in a 
single test episode. Notice that in non-staggered testing; all (™) distinct component subsets of k 
components (1 < k < m) are challenged in a single test episode. Except for classifying failure events 
that occur in consecutive episodes using impact vectors, if a demand failure occurs or is discovered 
during surveillance testing, the occurrence of a common-cause event will be discovered in a 
single test episode. 

If staggered testing is used, there are multiple test episodes per surveillance interval. Typically, one 
component is tested per episode, but all m components are tested within the test interval. For example, 
suppose we test component A and it succeeds, to determine the number of k-component subsets 
challenged in the test episode, we infer that certain combinations of common-cause events have not 
occurred. For example, in addition to A] (not A]), we infer C AB , C AC , and C ABC . So, in this case, 
we test component A, which by inference challenges ) additional distinct k-component subsets of 
the m component group for ( 1 < k < m) in a single test episode. 

If we test component A and find that it failed, we cannot determine if other components have also 
failed due to a common-cause unless we test them. Therefore, all m-1 additional components in the 
group are also tested within that same test episode. Clearly, as for the non-staggered scheme, all (™) 
distinct k-component subsets of the m-component group for ( 1 < k < m) are then challenged in a single 
test episode. 

Let’s summarize how non-staggered versus staggered testing impacts the estimate of the number of 
demands. For the class of similar redundant components in an m-component group, let N TS be the 
total number of test intervals derived from the test specification represented in the database from all 
power plants, let N TE be the total number of test episodes, and N D be the total number of component 
demands. Assuming all plants use non-staggered testing, N TE = N TS and N D — m N TE . Now define 
N k to be the total number of k-component subsets challenged. Since all m components in the group 
are demanded in a single episode, every k-component subset is also challenged. Therefore, 



(9) 


Nk - (^)N te - (™)Nts 


Notice that X/’Li (/") = 2 m — 1. This is the total number of non-empty subsets contained in a set of m 
elements. The basic parameter model explicitly assumes the testing scheme is non-staggered. The 
maximum likelihood estimator (MLE) for is 

Q ( fe m) = n V Nfe d°) 

where n k is the number of common-cause basic events involving k components derived from the 
experience database. Substituting 


p\( m ) _ n k 

h!k - / 


©N 


TE 


GO 


Then, substituting Q! ( m) into the basic parameter model equation we get an estimate for Q t 


_ \ i"k-i) n k 


Qt = r^:l)Q k 


Z rri 

(T)Nt 

k=l 


( 12 ) 


Substituting the identity 


fm\ m rm-l\ 

v k ) — k vk-l) 


(13) 


for the tenn in the denominator of Eq. 12 yields 


Qt = 



Ck-tH 

r(k->TE 


ym k n k 

^k=l m N TE 


N-p 

Nd 


(14) 


where N D = mN TE and the total number of component failures is given by N r 

N T =Yk=ikn k (15) 

For the staggered testing scheme, the NUREGs proceed to develop an expression for the total failure 
frequency (Q t ) as follows. In each test episode, only one component is tested unless that component 
is found to have failed in which case the remaining m-1 components are tested. To calculate the total 
number of challenges of k components and the total number of demands it is necessary to examine two 
cases. 


Case 1: The component tested in an episode is found to be good. The total number of successful 
demands represented in the database is = N TE — N CCBE where the total number of common-cause 
basic events is 

Nd = N C cbe = n i (16) 

This leads to the following expression for the number of k-component subsets challenged in the group 
of m components in one test episode. 


^ = G£) 


(17) 



Recall, each time a component is found to be good following a test or operational demand, none of the 
common-cause events containing that component can occur, and there are N[ ( such events. 

Case 2: The component fails in a test episode. If an episode test or operational demand reveals a failed 
component, of which there are N CCBE instances represented in the database, then all of the m-1 
remaining components in the group are tested. Assume these additional tests are done in the same 
single test episode. Notice that the number of challenges for this test episode is the same as a non- 
staggered episode. Hence, the expression for the number of challenges on each k-subset in the group 
of m components in one test interval is 


K = ( 18 ) 

Eq. 18 is not N[ ( ' = (™), since it is assumed that in the case of k+1 failures occurring in a single test 
episode, the failures would be mapped to a single common-cause and not a combination of an 
independent cause and a common-cause impacting the remaining k components. Therefore, combining 
Case 1 and Case 2 the total number of challenges of k-component subsets is given by 

Nk — + NJX — (Nte _ N ccbe )(“_ 1 1 ) + N ccbe (“_ : | l ) 

- (k-l)[(^TE - N ccbe ) + N ccbe ] = (19) 

Then, substituting N TS = mN TE into the above result 


N fc - m (T-i) N Ts - k (™) N TS 

Therefore, the staggered estimate for the basic parameter qL” 1 '* is 


p\( m ) _ n k 


k (k) N TE 


( 20 ) 


( 21 ) 


To calculate the approximate number of demands in the case of staggered testing, combine Case 1 and 
Case 2 and recognize that N TE » (m — 1)N CCBE to get 

Nd — N b + mNfJ = (N te — N ccbe ) + mN CCBE — N TE + (m — 1)N CCBE ~ N TE (22) 


Substituting 


into the basic parameter model equation we get an estimate for Q t : 


Qt = 



(T-TiH g gUnfc Z£ = 1 n k 

'k-l) N TE N te N d 


(23) 


Notice, the estimate for the total failure frequency in Eq. 23, is equal to the total number of common- 
cause basic events (N CCBE ) in the database divided by the approximate total number of demands. 
Rather than this being the failure rate of a component it is a basic event frequency. Note the contrast 
between Eq. 23 and the non-staggered estimate of Q t in Eq. 14. 


3.2. Alpha Factor Model 


The alpha-factor model for a component group of size m is a sequence of m conditional frequencies 
(ai, a 2 , ... a m ), such that given the occurrence a of common-cause basic event, a k is the probability 



that the event consists of k failures, 
parameter model is 


The definition of the oik in tenns of the parameters of the basic 


a k 


r 


re)Qr 

®Ql m) 


(24) 


Applying the definition for a k in Eq. 24 along with the definition of Q t in Eq. 7, we can express the 
basic parameters Q| ( "^ in tenns of a k and Q t . First, transpose the denominator in Eq. 24 to the other 
side of the equation, then substitute the identity, Eq. 13, to get 

(T)Qk m) = ( 25 ) 

‘—‘k = 1 


Sum both sides over k ,( 1 < k < m) 

Zk =1 ka k V C”)Qk m) = m Sr=i(ri)Qk m) - m Qt 

£—‘k=l 

Letting a T = ka k and rearranging we have 


Z T71 

k = 1 


©Qr - ^ 




(26) 


(27) 


Substituting the right side of the equation into the definition of a k and using the identity, Eq. 13, yields 
an expression for Q^"" 1 , (1 < k < m) 


n (m) _ k afc 
Yh rm- l\ 

K tfc-i) a t 


Qt 


(28) 


Notice this equation is valid for both staggered and non-staggered testing. This re-parameterization of 
the basic parameter model is done in part to simplify data analysis; i.e., the alpha factors can be 
estimated from the common-cause frequency data without needing to know the number of demands as 
follows 


a k = 


'^k=i n k 


The derivation for Eq. 29 can be found in NUREG/CR-4780 [7]. 


(29) 


4. ISSUES WITH THE STAGGERED METHODOLOGY 

We have now shown the development of the basic parameter model and the alpha-factor method for 
both the non-staggered and staggered testing schemes. Recall, the contrast between Eq. 14 and Eq. 23; 
and between Eq. 1 1 and Eq. 2 1 . 


Table 1 Basic parameter estimates 


Non-Staggered 

Staggered 

N t 

* ZZLr n k 
Qt “ n d 

fj( m ) — n k / 

Qk ' /©Nte 

rj( m ) — n k / 

Q - “ /k©N TE 




The staggered estimates are not consistent with Eq. 28 and Eq. 29. Here, we will investigate in detail 
the cause of the anomalous result for the staggered case and will illustrate the problems that arise from 
using the staggered methodology. 

4.1. Issue Statement 


The issues with the staggered methodology can be summarized as follows: The staggered estimate for 
the basic parameters is based on a particular counting argument. This counting argument yields a 
greater estimate in the number of challenges on common-cause groups within a test interval which 
alters the frequencies of failure of the various common-cause groups. This reveals a subtly in the 
definition of the alpha-factor model. This subtly is not captured in the representation of the MLE 
ofa k , a k . It has been, until now, incorrectly concluded thata k is independent of the testing scheme 
assumption. The correct conclusion is that a k is dependent upon the frequency of failure of a k- 
component group per system demand. And, this frequency is dependent upon both the testing scheme 
assumption and the database of failures, {n k \k = 1, ... , mj. 


In order to clarify the issue, recall that the definition of a test interval is a period of time in which each 
component of a system is tested at least once. In addition, the definition of a system demand is a test 
that challenges each combination of failures exactly once. The distinction between these two is 
important and will become apparent in the following development. 


In the context of a probabilistic model, we can consider a system demand to be an experiment that 
consists of a series of tests, precisely one for each of component, in one or more episodes along with 
an analytical step that maps component test results to a single outcome. The sample space for system 
demand on a three component system is given below: 


n 


3 _ 


{F , Aj,Bj, C[,A[ * B I ,A l * Cj,Bi * C h C AB , C AC , C bc ,A i * C bc ,B i * C AC , Cj * C AB , 


Aj * Bj * Cj, C ABC } 


(30) 


where F represents that no failures occurred. Note that, in this model, the sample space of outcomes 
for a system demand is independent of any testing scheme assumptions. This is because the sample 
space for a system demand is a well-defined characteristic of the basic parameter model. To be 
consistent with the definition of an experimental outcome, only one outcome is possible for each test 
interval. 


We now decompose the outcomes of a system demand into the parts that are used to estimate the 
model parameters, Q fc . Each outcome from a system demand is expressed as a combination of basic 
outcomes from the sample spaces shown below. 

Hi = {F 1 ,Ai,Bi,C I }, fl.2 = {(fy C AB , C AC , C BC }, Cll = {F 3 ,C abc } (31) 

where fy, i = 1,2,3 represents no failure of a common-cause group of size i. 

From these sample spaces, we can see that from a system demand, there are three opportunities for an 
individual component to fail independently; three opportunities for a common-cause group of size two 
to fail, and so forth. The method of enumerating the challenges to the members of ftf , (i = 1,2,3) for 
a system demand is straightforward and illustrated below in the case of a non-staggered test interval. 



Table 2 Non-staggered using Counting Method 1 


Non-staggered Test Interval 

Test Episode 

ABC 

Ai 

B, C, 

Qb Qc C B c 

Qbc 


Components 

Events Challenged 

Counting Method 1 

MiH 




It is clear that each outcome in ftf, (i = 1,2,3) is challenged exactly once per test interval. Notice, 
that a non-staggered test interval satisfies the definition of a system demand. Consequently, the 
estimator, Qj^, from Eq. 10, quantified by Counting Method 1, has an intuitive interpretation that 
agrees with the definition of the basic parameter, Qj ( nl) . 

However, for the staggered case, the NUREG’s adopt a different approach to enumerating the number 
of challenges to common-cause groups within a single test interval. Recall, that within each test 
interval, all components are tested exactly once unless a failure is observed. If a failure is observed in 
a test episode, then the remaining m-1 components are also tested in that episode. The result is 
additional challenges to certain common-cause groups for each test episode. The additional challenges 
that are taken into account are illustrated in the table below: 


Table 3 Staggered using Counting Method 2 


Staggered Test Interval 

Test Episode 1 

A 



Ai 



Qb 

Qc 


Qbc 

Test Episode 2 


B 



B, 


Qb 


Cbc 

Qbc 

Test Episode 3 



C 



c, 


Qc 

Cbc 

Qbc 


Components 

Events Challenged 

Counting Method 2 

CO 

II 

T— 1 

II 

525 

N 2 = 2 

6H 



The table above illustrates that the challenges to k-component groups are counted k times each. We 
now come to the fundamental issue with the staggered model. It must be understood that for each 
system demand, each k-component group is given exactly one opportunity to fail. Therefore, using 
Counting Method 1, the frequency of failure of a k-component group per test interval equals the 
frequency of failure of a k-component group per system demand. However, using Counting Method 2, 
the frequency of failure of a k-component group per test interval equals k times the frequency of 
failure of a k-component group per system demand. For example, in Table 4, note the different 
representations of the frequency of failure of a k-component group using both counting methods, 


Table 4 Frequency of Failure Representations 


Frequency of Failure 

Per Challenge 

Per Test Interval 

Per System Demand 

Non-Staggered - 
Counting Method 1 

n (m) _ n k 

Qk (?)*rs 

( 

m\ 

k) 

c £ 
II 

O' 


/771 

u 

x. £ 
C £ 

II 

jL 

o 

Staggered - 
Counting Method 2 

<o 

*■? 

II 

3 

3: 

3 

k- 

/m 

U 

c £ 
II 

O 

( 

771 \ 

k ' 

„(m) ( n k/k) 

" N TS 


The number of challenges per interval depends upon the testing scheme assumption. In Table 4, the 
two frequencies of failure per system demand are the correct quantities to use in the respective non- 






staggered and staggered estimates of a k . Therefore, the fonnula for a k under the staggered 
assumption is a k = and the fonnulae in Table 5, below, are consistent with Eq. 28. 

*-‘k=: iA n k/k) 


Table 5 Basic parameter and alpha factor estimates 


Non-Staggered 

Staggered 

N t 

q ‘ = *7 

ZZLin* 

Qt_ N D 

Qk ' /©Nte 

rj( m ) — n k / 

Q - “ /k(")N TB 

~ _ U k 

a k ym 

hk=l n k 

(n k /k) 

“* “ 2?.i(n k /k) 


Therefore, as seen in Table 5, the alpha-factors estimates are not necessarily only dependent upon the 
total number of failures of the various common-cause groups (as they are in the non-staggered case). 
Since, the alpha-factors represent, given the occurrence of an event from a system demand, the 
conditional frequency of failures of the various common-cause groups per system demand; and, 
because the failures are detected during test intervals, it is necessary to know the relative number of 
demands per common-cause group size per test interval. This has not been stated clearly until now. 

5. NASA USE OF ALPHA FACTOR MODEL 

The Constellation Program implements the NASA PRA guidance document [11] through the 
Constellation Program PRA Methodology Document CxP 70017, which specifies that common-cause 
be modeled using the Alpha Factor Model. Discussions among the Program’s PRA analysts regarding 
the most appropriate testing scheme assumptions to use in the Constellation PRA were useful; but, 
ultimately failed to reach a consensus. Consequently, the Constellation PRA allowed each of the 
elements to select either non-staggered or staggered testing assumptions according to the method that 
best represented how their element hardware is to be tested. 

However, it is improper to infer two distinct interpretations of the generic data within a system PRA. 
The reasons for this are clear. Using the staggered method, results in a lower incidence of common- 
cause. But, these benefits are an artifact of the model’s assumptions. The difference that is captured 
by the mixed use of non-staggered and staggered methods is merely the difference that results from 
two interpretations of how generic data were collected less the effects of the error in the staggered use 
of the non-staggered a k . 

PRA trade studies used to support Risk Informed Decision Making (RIDM) are being perfonned on 
the Constellation and Space Shuttle Programs. One of the main issues we see in this area regarding 
common-cause modeling is the use of both non-staggered and staggered testing assumptions to model 
common-cause within a single trade study. The trade study alternative that assumes staggered testing 
takes credit for a lower incidence of common-cause and therefore, a lower risk result than the 
alternative that assumes non-staggered testing. This practice biases the risk results by assuming the 
risk benefit going in and then taking credit for it in the results. 

Aside from the significant amount of additional data that can be obtained with the relatively small 
number of additional tests that stem from the implementation of a staggered testing schedule, there 
may be real risk benefits to staggered testing because common-cause coupling factors induced by 
testing are reduced when similar components are tested in separate episodes. For example, the tests 
may be conducted by different operators using staggered testing as opposed to non-staggered testing. 
Common-cause failures due to same-operator would be mitigated using a staggered testing scheme. 





These benefits, however, are not reflected in the basic parameter model. In order to quantify them, a 
cause level analysis of coupling factors should be performed. 


6. CONCLUSION AND RECOMMENDATIONS 

A thorough review of common-cause modeling methods should be conducted within NASA programs 
and projects of PRA. In light of the issues with the staggered methodology, the non-staggered method 
is the recommended approach. Instances in which models are found to be using the staggered method 
should be converted to the non-staggered method. This should be done regardless of the actual testing 
scheme used in surveillance testing or in pre-flight checkout. 

The use of the staggered method results in an artificial reduction in model risk estimates and it biases 
trade-study results in favor of the trade alternative that uses the staggered method. The use of the 
staggered method results in a low risk estimate relative to the non-staggered method. A mixed use of 
the non-staggered and staggered methods within a trade study is particularly misleading because of the 
artificially induced bias introduced. Therefore, the decision-maker’s ability to make correct decisions 
is undermined. 

Future revisions of NASA documentation should incorporate a discussion of parametric methods with 
revisions that reflect the correct development of models. The use of the terms non-staggered and 
staggered with reference to parametric models should be eliminated because the parametric models do 
not explicitly model the root causes that lead to the difference in risk due to testing schemes. The 
terms should be reserved just for describing the testing schemes themselves. 
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